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ABSTRACT 


This thesis formulates predictions for Recruit Training Command (RTC) Great 
Lakes’ recruit graduation rates based on two econometric approaches. The Navy’s recruit 
graduation rates exhibit pronounced seasonal and long-term behaviors, which tends to. 
cause logistical problems at RTC. The modeling and subsequent forecast of RTC 
graduation rates is therefore an important management tool which could facilitate future 
planning for both RTC Great Lakes and the US Navy. 

First the multiplicative decomposition method is employed to produce a model. As 
an alternative method, we utilize the autoregressive integrated moving average (ARIMA) 
process to describe the data. In both instances, satisfactory forecasting results area 


attained. 
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I. INTRODUCTION 


The Recruit Training Command (RTC) in Great Lakes, Illinois is home to the U.S. 
Navy’s recruit training and the largest training center in the Navy. Since its founding in 
1911, RTC has prepared men and women for duty in the naval service. With the closures 
of RTC Orlando and RTC San Diego in 1994, has been the sole source of recruit training 
[http://www.ntcpao.com/]. 

The size of the recruit population at RTC Great Lakes has been and continues to be 
influenced by external forces. These factors, such as high school graduation dates and 
the seasons of the year, cause a cyclical inflow of newly reporting personnel that report 
for basic training. Over sixty percent of the year’s accessions arrive between the months 
of July and November, which causes a acaba of logistic and ultimately financial 
difficulties for RTC [Executive Officer, August 1998]. 

Examples of such difficulties include the placement of Recruit Division — 
Commanders (RDCs) and support staff. While these personnel may be gainfully 
employed during the peak months, or “surge,” the low number of recruits from March to 
June causes many of the aforementioned RDCs to assume administrative duties or be 
reassigned to other tasks. Conversely, RDCs are in high demand during the peak months, 
and staff billets often go unfilled. Other major cost centers affected by this cyclical 
phenomenon are berthing and messing functions. RTC Great Lakes has only capacity for 
approximately 1500 recruits at any one time, a constraint imposed by the physical 


limitations of the base itself [Data Control Officer, September 1998]. 





A. THESIS OBJECTIVE 

The purpose of this thesis is to model the phenomenon of recruit population, or 
graduation rate. The graduation rate is of particular interest to the Navy, and correlates 
highly to the accession of new recruit inputs into RTC. Once the graduation rate has 
been modeled mathematically, it can be used as an accurate predictor of future graduation 
rates from one to many months in the future. Such knowledge can help RTC Great Lakes 


and the US Navy in future manpower planning. 


B. THESIS ORGANIZATION 

This thesis begins with a presentation of the numbers of graduates per month 
provides by RTC Great Lakes, followed by a time-series analysis of the data. First, the 
decomposition method is discussed. Then it is wine in an attempt to describe the 
data. This is followed by the autocorrelation integrated moving average (ARIMA) 


method, its results, and use in forecasting. Conclusions and recommendations follow. 








Ii. DESCRIPTION OF DATA 


The data of graduates from RTC Great Lakes starts in October 1994 and concludes 
July 1998 [Data Control Officer, September 1998]. October 1994 represents the first 
period in which Great Lakes became the sole source of recruit training 
[http://www.ntcpao.com/]. Inclusion of previous years’ data raises the possibility of 


inconsistent data, as it does not reflect the total number of recruits. 
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The data consists of a series of equally spaced monthly data. This is the underlying 
definition of a time series, in which the phenomenon in question is a function of time. A 
graphical presentation of the above data shows the volatility of RTC Great Lakes’ 
graduation rates. The data appears to have a seasonal nature, with a period of 
approximately twelve months. Also of note is that the data exhibits increased instability, 
or a more pronounced “seasonal” effect over time. The connecting line between discreet 


points is for illustrative purposes only. 
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I. TIME SERIES ANALYSIS 


A. DECOMPOSITION METHOD 

1. Introduction 

A typical time series data set can be considered to be an aggregate of four distinct 
components. The simplest to understand is the so-called long-term trend, which we shall 
designate T. This trend can be negative, positive, or in the case of neither, unchanged. In 
any event, it may be represented by the linear regression line of a data set or so-called 
line-of-best-fit. The regression line is calculated as the minimizing the sum of squared 
errors between a data set and a straight line of the form y = mx + b. 

Another component is seasonal variation. This behavior is typified by a data set’s 
change in values according to the time of year or other seasonal regularity such as the 
weather. Seasonal variation, or S, is repetitive in nature and very similar to cyclical | 
variation, C. The distinction between seasonal and cyclical variation lies it the fact that 
seasonal wanion has specific fixed time intervals, and cyclical variation ines not. 
Cyclical variation can last any specified length of time, which is sometimes regarded as a 
business cycle. Also of consequence in all time series analysis is random variation, R. 
' Random variation can account for the lack of any identifiable data pattern, and is almost 
always present to some extent in any real set of data points [Gujurari, 1995]. 

Time series data can be viewed as a combination of the above behaviors [Gujarati, 
1995]. Mathematically speaking, if we consider the variable Y to be the phenomenon 
under observation, Y may be expressed as the product of the four aforementioned 


behavior patterns: 








Y=T-S-C-R (1) 
Where 

T = long-term trend, 

S = seasonal variation, 

C = cyclical variation, and 

R = random variation. 

This model captures all of the aforementioned behavior patterns. Since Equation (1) 

is a multiplicative model, these components are superimposed on each other, forming an 


ageregate pattern. Equation (1) allows each component to be manipulated or isolated 


[Gurarati, 1995]. 


2. Moving Averages 

Paramount to the decomposition method is the calculation of the data’s noun 
querage MA. To obtain accurate figures, we shall use a centered moving average which 
is centered to the middle of the data points in question. Since we are using monthly data, 


we will employ a twelve-period centered moving average of the form 


MAi = Yiet2°:2D(Yist+ Yiat... aoe Gta + Yi+a) + YVies 
22 


(2) 





By employing spreadsheets, RTC Great Lakes’ moving averages for graduate data, 


October 1994 to July 1998 is calculated as follows: 














a A A 7 
ee 1 GR OE Det Deke ERNE Ren 
Tif Ocr-94[ 2951 | ——«~SS~*« ot 9G] ses *|~~C 
Nov-94] 1742 | | 26 Nov.96| 3132 | 3043 
Dec-94| 2923. | ———S«dtCSSC« YC 
a Jan-95[2717_-| SSS S897] __ 3843 | 0912 
3} Feb-95[ 2783 | —~—=«dYSS« Feb 972807 | 3008 
___6| Mar-95[ 2585 |_| 30} Mar-97| 2266 | 3198 
[8] May-95| 2336 | 2795_| 32] May-97| 2806 3583 
9} Fun-93] 3375 | 2859 | 33 ‘Jun-97|__ 4019 | 3716 
Tul95 
Sep-97| 6027 | 3920 
Deo-97 
[16| Jan-96[ 3497 | 2947 | 40] Jan-98| 4097 | 3785 
Feb-98 
Mar-96 Mar-98| 2279 | 
19] Apr-96[ 2178 [2955 | _43| Apr-os] 1998 | SS 
May-96| _2251__| 2989 | _44| May-98| 2932 [SS 
Jun-98[ 5042[ 
[23[_jubk96[ 3738 | 3013 7 jul9s] 6177 | 
| 
ae 


fl RTC GRADUATES, OCT94 - JUL9S st 
"Period | Month | Graduates | Moving Avg | Period | Month | Graduates | Moving Avg 









The use of moving averages smoothes short-term fluctuations by averaging any data 
point that may be unusually high or low [Judge, et all, 1985]. Since each period covers a 
complete cycle of observation, in our case twelve months, the data’s moving average can 
be considered a product of its long-term trend and cyclical variance [Gujarati, 1995]. 
MA=T:-C | . ¢ (3) 
By incorporating Equation (1), Equation (3) becomes 
Y=MA-S-Ror 


Y/MA=S:R (4) 











The ratio Y/MA is called the actual-to-moving average ratio. It is an important relationship 
as seasonal and random variances can be isolated [Gujurati, 1995]. Said another way, the 


actual-to-moving ratio is said to contain seasonality and randomness. 


3. Seasonality 
We can now de-seasonalize. This is done by averaging all moving-to-average ratios 
found previously by month for all years to obtain seasonal indices, S. Each seasonal 


index corresponds to a specific month, and is found in the last column : 












Y/MA TABLE -- DETERMINATION OF SEASONAL INDICES 
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As the sum of the arithmetic mean does not add to 12, the indices can be adjusted by 


multiplying each average by the quotient of 12.0000/13.1478. The sum of indices should 











add to twelve, corresponding to the number of months in a year. We next obtain the 


de-seasonalized data. Dividing both sides of Equation (1) by S, we obtain the equation: 


YS=T'C°R (5) 


Our original data now take the form: 
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These de-seasonalized values can be represented graphically: 
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The estimate of the trend line T is found by using the de-seasonalized data. To 


proceed further, the resultant least-squares equation for T can be found by means of a 


simple linear regression calculated by the MINITAB6O release 12.1 software package: 


Regression Analysis 


The regression equation is 
Y/S = 2743 + 33.2 Perio 


redictor Coet stDev . 
onstant 2743.4 174.3 15.74 0.000 
eriod 333223 6.964 4.77 0.000 




















The graph and table of de-seasonalized data and the least squares equation values follow. 


The trend line values follow readily from the least squares equation and are computed 


using a spreadsheet. 
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4. Forecasting 
Having obtained the adjusted seasonal indices and trend line values, we can construct 


a forecasting model of the form [Gujurati, 1995] 


=S-T,Or ee (6) 


Y = S-(2743 + 33.2- Period) 


13 





The consequent values from Equation (6) are as follows: 


—— _ RTC GRADUATES, OCT94-JUL98 | 
| |__| Seasonal | Forecast| |__| Seasonal | Forecast_ 
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Bel oes es hee 
| 29st | | C8101 | (0.8880_| 3232 
| 2} i7ma2 || 83843 | 1.1824 | 4342 | 
| 3, 293 | 92807 | 1.0285 | 3811 
| 6] 2585 | «1.2188 | 3586 | 32] 2806 | (0.5860 | 2230 
| _8{ 2336 _~—| 0.5860 | 1763 | 34] ~6012_—|_1.0395 | 4025 __ 
| —3375_—|:-0.9323 | 2836 =| 35| 6159 | (1.1147 | 4353 
: 
pu 20) 24390...) 70.9323). 3207 1 
[22] 3738_—| 1.0395 | eit | | | 
| 23] 4176 | 1.147 | 3909 | | 
| 24) 3445 | 1osea | 36 | || 
| 25] 3588 | o9i47 | 32608 | | | 
| 26] 3132, «| 0.9517 | 3432 | 


The resultant values of Equation (6) and actual observed values are represented 


graphically: 
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We can extend Equation (6) to establish a forecast of future RTC Great Lakes 


graduation rates for the multiplicative decomposition method: 






RTC GREAT LAKES GRADUATE FORECAST 
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A graph of actual and expected values including forecast figures, appears as follows: 
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5. Forecast Error 
The underlying assumption in any time series forecast is that the time series will 
behave in the future as it did in the past. A point forecast, which corresponds to a 
discreet data point, represents the best prediction of the value of the variable in question 
at any given point in the future. It is our “best guess” for the future value of the variable 


(Harvey, 1993]. 


In order to ascertain the validity of the decomposition model we performed a similar 


analysis, this time with only forty observed values. This separate analysis included data 


from October 1994 through January 1998. As expected, different moving averages and 
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seasonal indices were produced. A forecast was then executed for the remaining six 
observed periods, February 1998 through July 1998, and that forecast was contrasted to 


_ the actual observed values of the data from that time period. This procedure is known as 





a back forecast, and is utilized as a preliminary litmus test of the model under review 


_ [Box and Jenkins, 1970] | 


Multiplicative Decomposition Back Forecasting Results 
 FEB98 - JUL98 


—@— Observed t= Forecast 


MAGNITUDE 





A visual inspection of the back forecasting provides an indication that the multiplicative 


decomposition model is appropriate. The back forecasting results appear to model the 
actual observations. The degree of appropriateness, or amount of quantifiable error 
inherent in our model shall be discussed shortly. 


Unfortunately, all attempts at forecasting involves some degree of uncertainty which 


increases the further one is removed from the origin of the forecast, period t [Box and 














Jenkins, 1970]. Unpredictable fluctuations inherent in the data imply that some error in 
forecasting must be expected. A large degree of variance o” in these fluctuations will 
limit the accuracy of our forecasts [Bowerman and O’Connell, 1979]. Conversely, a 
smaller variance of the irregular component of the data will allow us to forecast with 
greater confidence in the results. Another aspect of forecast error comes from the type 
and specifications of the forecast model itself. The accuracy with which we derive or 
select the components of the time series model influences the error inherent in our 
forecast [Bowerman and O’Connell, 1979]. The better the model describes the data, the 
less the degree of forecasting error. 

An examination of forecast errors over a large period of time can reveal whether the 
forecasting technique used is relevant. In the case of decomposition, we should expect 
that all seasonal, trend, or cyclical components of the data have been eliminated, leaving 
only a random component [Bowerman and O’Connell, 1979]. This can be seen 
graphically with the residual plot ‘below. Residual values represent the difference 
between observed values and those values predicted by the forecast equation. For a 
forecast method to be accurate, its residual plot should exhibit no discernable pattern. In 


the following data we have not identified a distinct pattern over time: 
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MULTIPLICATIVE DECOMPOSITION MSE CALCULATION 
Observed Forecast Squared 


‘Period| Month | Value | Value | Error | Error 
ee |) a | 
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By itself, the MSE figure for the multiplicative decomposition method tells us little. - 


When compared to other forecast methods’ MSE, however, it can be used to aid the 
process of forecast technique or model selection [Bowerman and O’Connell, 1979] with 
lower MSE scores being preferable [Kennedy, 1979]. We shall compare the 
multiplicative decomposition method’s MSE with another model shortly. For now we 
can assume by way of the back forecast and residual plot that the multiplicative 
decomposition method is in itself a relevant model which can be used to adequately 


forecast RTC Great Lakes’ future graduation rates. 
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B. AUTOCORRELATION INTEGRATED MOVING AVERAGE METHOD 

1. Introduction 

The use of mathematical models to describe the behavior of a particular phenomenon - 
has been thoroughly established [Harvey, 1993]. One might use equations to calculate an 
object’s trajectory through space or pH levels in a chemical process. No process is 
entirely deterministic, however, as unknown factors tend to oil havoc with 
deterministic models and equations. It is important to recognize that randomness is 
always present to some extent in a data set. Deterministic ee lack the ability to 
quantify or codify outside forces into a coherent mathematical expression. For example, . 
an investor can know virtually everything about a corporation and possess the latest 
ciiientiiiaal data, however, accurately forecasting the corporation’s stock price on a 
- daily basis is for all practical purposes impossible. 

While it may prove futile to write a deteeministie model which exactly calculates the 
future behavior of a probabilistic process, it may be possible to derive an expression 
which models data within specified ene [Harvey, 1993]. Such a probabilistic process is 
also referred to as a Stochastic process. A stochastic model defines a mechanism which is 


regarded as being capable of generating the observed values in question [Harvey 1993]. 


2. .Mathematical Terms and Expressions 


a. Indices 
t time 
t+1 future time / units distant 
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b. Operators 


B backward-shift operator 

V backward-difference operator 

0(B) autoregressive operator 

3(B) generalized nonstationary autoregressive operator 
@(B) moving average operator 

c. Data 

Z, graduates in current month t 


d. Variables 


autoregressive variable of order p 


®, 

3, generalized nonstationary autoregressive variable of order j 

Oe, moving average volatile of order q 

a, shock or noise at time t 

Z, deviation from trend yw at time t (Z, =Z,-M) 

Z,(1) forecast made at origin t of the graduates Z,,, at future time tH] 


The backward-shift operator B is defined 
BZ, = Z;. 


More generally, we can say 


The backward-difference operator V is defined 


VZ,=2Z,-Z,.) = (1- BZ, 
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3. Autoregressive Processes 
A stochastic model that has proven to be particularly useful in the representation of 
certain time series data is the autoregressive model [Box and Jenkins, 1970]. In this 


model, the current value of the process Z, is expressed as a finite, linear aggregate of the 


ate 


of the process, we write a first-order autoregressive process, designated AR(1) [Box and 


Jenkins, 1970], 
Z,=@,Z,,+a, t=. (7) 


In the case of AR(1), the model depends only on the previous value of the data. 


Likewise, the second-order autoregressive process, AR(2) is defined by 
2, =O, Zu,+®,Z,.5+4,, t=Lt (8) 
In general, we may write an expression for an autoregressive process of order p: 


Ly = DL, + DZ y,9+ O23 4+..+® Zp ta, tal | (9) 


_ It is possible to determine the appropriateness of the autoregressive process to the 
time series in question by means of the data’s autocorrelation graph (ACF). 
Autocorrelation describes the mutual dependence among values of the same variable Z, 


at different periods. If the data set contains purely random values, the autocorrelation 
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among successive values will be close to or equal to 0. Conversely, data that exhibits a 
definite dependence on previous values of the variable Z, will be highly correlated [Box 
and Jenkins, 1970]. The data plot which follows illustrates the ACF graph for RTC Great © 
Lakes’ graduation data. The gradual decrease of the autocorrelation coefficients, as 
opposed to a sudden drop to 0, suggests the appropriateness of the AR model in the case 
of the RTC Great Lakes data [Box and Jenkins, 1970]. The ACF graph is the product of 


the MINITABo software package, release 12.1. 


Autocorrelation Function 


ACF of RTC Great Lakes Graduates 


-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 


fone paw wn penn pe mn tien toe nn teen tonne tenon te nat 

1. 0.643 XXXXXXXXXXXXXXKXKX 
2 0.274 XXXXXXXX 

3. -0.060 XXX . 
4 -0.138 XXXX 

5 -0.122 XXXX 

6 -0.141 XXXXX 

7 +-0.180 XXXXXX 

8 -0.215 XXXXXX 

9 -0.064 XXX 

10 0.154 XXXXX 

11 0.406 XXXXXXXXKXX 
12 0.416 XXXXXXXXAXXX 
13 0.242 XXXXXXX 

14 -0.009 Xx 

15 0.146 XXXXX 

46 -0.128 XXXX 

17. +-0.067 XXX 

18 -0.061 XXX 

19 -0.170 XXXXX 
20 -0.172 XXXXX 
21 -0.124 XXXX 
22 0.039 XX 
23 0.180 XXXXX 
24 0.240 XXXXXXXK 
25 0.141 XXXXX 
26 -0.052 XX 
27» =~-0.107 XXXX 
28 -0.086 XXX 
29 -0.078 XXK 
30 =-0.139 XXKX 
31 -0.216 XXXXXX 

32 -0.238 XXXXXXX 

33. -0.192 XXXXXX 

34 -0.056 XX 
35 0.015 X 
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4. Moving Average Processes 


Another model, the so-called moving average process, expresses Z,as a finite 
number q of current and previous shocks in the system, a,,a,.;,...,a,.. This is referred to 


as a moving average (MA) process of order g, or MA(q). The general form of the 


process is written [Box and Jenkins, 1970]: 
| Z, =a, -@,4,,+0,4,..+0,4,.,+..+0 4,,, t=1.T (10) 


Moving average models imply that what occurs at time t is ‘et influenced by previously 
observed values of the variable in question, nor will it be influenced by future events. It 
is also referred to as the White Noise model [Box and Jenkins, 1970]. 

Like its counterpart the AR process, the appropriateness of the MA process to a 
a data set may be determined by means of a graph, in this case the partial 
autocorrelation coefficient (PACF) plot [Box and Jenkins, 1970]. The PACF for RTC 
Great Lakes graduation data follows. Note the coefficients’ gradual reduction. This data 


plot is also generated by the MINITABO software package. 
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Partial Autocorrelation Function 


PACF of RTC Great Lakes Graduates 


-1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 90.4 0.6 0.8 1.0 
+ 


wae apa nn tan a pan tenn penn tenn ten tenn tenant 
1 0.643 XXXXXXXXXXXXXKXXX 
2 -0.239 XXXXXXX 
3 -0.226 XXXXX 
4 0.129 XXXX 
5 ~-0.025 XX 
6 -0.191 XXXXXK 
7 -0.063 XXX 
8 -0.051 XX 
9 0.202 XXXXX 
10 0.157 XXXXX 
ll 0.235 XXXXXXX 
12 -0.040 XX 
13. -0.107 XXXX 
14 -0.087 XXX 
15 ~-0.005 X 
16 0.055 XX 
17 0.057 XX 
18 -0.016 X 
19 ==-0.135 XXXX 
20 0.032 XX 
21 -0.118 XXXX 
22 -0.034 XX 
23 0.101 XXXX 
24 0.131 XXXX 
25 -0.044 XX 
26 -0.160 XXXXX 
27 0.052 XX 
28 -0.018 
29 -0.190 XXXXXX 
30 0.021 XX 
ol 0.034 XX 
32 ~-0.044 XX 
33 -0.137 XXXX 
34 -0.019 X 
35 -0.156 XXXXX 
36 -0.091 XXX 


5. Mixed Autoregressive-Moving Average Models 

To obtain greater flexibility in modeling time series data, it is usually advantageous 
_ to include both autoregressive (AR) and moving average (MA) terms in the stochastic 
model [Box and Jenkins, 1970]. Combining Equation (9) and Equation (10) provides a 
mixed process of AR and MA elements known as an autoregressive-moving average 


process of order (p,q) or ARMA(p,q): 


+...+@ Z +a,-©,4a,, —...-9,a@ an ey (11) 
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Since V7Z, =V°Z, for d = 1, we can replace Z, with Z,[Box and Jenkins, 1970]. 


Equation (11) portrays the dependent variable not only as a function of previous 
observations, but also previous deviations caused by ambient noise. This non-linear 
equation is highly effective in modeling a wide array of behavior patterns [Box and 


Jenkins, 1970]. 


6. Stationarity 

When a time series appears to vary about some fixed level or mean, it is said to be 
stationary in the mean [Box and Jenkins, 1970]. Time series, as alluded to in the 
previous discussion on the decomposition method, may exhibit a long-term trend, be it 
positive or negative. In the case of the RTC eld Lakes graduation data, the 


observations fluctuate about the regression line in an upward-moving trend. This is 


called non-stationary behavior, and can be evidenced below: 














RTC Great Lakes Graduates OCT94 - Jul98 


—@— Observed == Trend Line 


MAGNITUDE 













Oct-94 Feb-95 Jun-95 Oct-95 Feb-96 Jun-96 Oct-96 Feb-97 Jun-97 Oct-97 Feb-98 Jun-98 
ARMA models apply to horizontal or stationary data distributions only [Box and 

‘Jenkins, 1970]. Fortunately, we can difference, or adjust, the original data to achieve 

stationarity [Bowerman and O’Connell, 1979]. In practice, the trend is removed by 


taking successive differences of the data to generate a new series. The following table 


indicates the result of taking one difference from the original data: 


28 








RTC GREAT LAKES DIFFERENCED DATA, OCT94-JUL98___ 
Differenced Differenced 


Value 
Zt - LZitt1 





4 
fos 
© 











ed 

tL 

456 

42717 | 28] 38431086 

6} 2585 | 865 | 30] 2266 | 468 

8} 2336 | 1039 | 32] 2806 1413 

_ of 3375 587 | 83] 42-1793 

to} 3962 | 8B || 4T 
p32 
| 1250 


A graph of the new series shows that the positive long-term trend has been removed from 


the data: 


29 








RTC Great Lakes Differenced Values, OCT94 - JUL98 


2000 +—~ 


1000 ++ 


MAGNITUDE 


-1500 4 





2500 = 


If the dth difference of the original time series is stationary, a non-stationary data set may 
be represented by an ARMA model. This is referred to as an autoregressive-integrated- 
moving asierage model, ARIMA, of order (p,d,q) [Box and Jenkins, 1970]. In this case 
d=1, the first difference of the original data. 

Mathematically, non-stationarity may be represented by a generalized autoregressive 


operator 9(B) [Box and Jenkins, 1970]: 


9(B)= &(BY1- BY’ = O(B)a,_,where (12) 


@(B)=1-0,B-®,B; -...-® ,B’ 
@(B)=1-©,B-©,B; -...-©,B? 
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(B) is the autoregressive operator of order p and is assumed to be stationary. ©(B) is 
the moving average operator of order q. It is also convenient to consider an extension of 


the ARIMA model by adding a constant term ©, [Box and Jenkins, 1970]: 
9(B)Z, = 0(B)(1- B)* Z, = ©, + @(B)a, (13) 


7. Model Selection 
As mentioned earlier, p and-q represent the order of the autoregressive and moving 
average processes, respectively. We can attempt to determine the order of these 
processes by means of a visual examination of the PACF and ACF graphs. In the PACF 
. graph, the number of statistically significant partial autocorrelation coefficients is the 
same order as the AR(p) model, or p [Judge, et all, 1985]. In our case we see that there . 
are at least two statistically significant coefficients, suggesting at least an AR(2) model. 
Similarly, the order of the MA(q) model, q, is determined a ACF graph [Box and 
Jenkins, 1970]. Upon examination of the graph we find that the MA(3) model is a very 
likely saints for consideration. With d=1, our best guess for the ARIMA model is 
the ARIMA(213) process. ‘This can be easily verified with the MINITAB© software 


package: 
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ARIMA Model 


Final Estimates of Parameters 





Type Estimate St. -DeV. “C=ratio 
AR 1 1.1141 OLS 23 8.42 
AR 2 -0.8439 0.1446 -5.84 
MA 1 Ln SOS7 0.0234 DOac7 
MA 2 =1 0520 0.1613 = 6935 
MA S 0.7808 0.1514 is 6 
Constant 21.6945 O20133 1629.89 
Differencing: 1 regular difference 
No. of obs.: Original series 46, after differencing 45 
Residuals: SS = 18396892 (backforecasts excluded) 
MS = 471715 DF = 39 


The t-ratio is a measure of the standard error of each particular coefficient. It can be 
thought of as the number of standard errors from zero. For example, the AR(1) 
coefficient’s t-ratio of 8.42 implies its significance is 8.42 standard errors from zero A 
high t-value indicates that the p and q coefficients play an increasingly important role in 
the model. aac t-ratios of ial two are considered to be significant [Gujurati, 
1995]. In our case, we see that the lowest t-ratio is of magnitude 5.16, and the highest of 
magnitude 1629.89, well over two and suggesting appropriate coefficients. Indeed, 
MINITABO® trials of other ARIMA processes such as ARIMA(212), ARIMA(211), ana 
ARIMA(112), do not yield as good or consistent results as the ARIMA(213) model. It 


shall be our model henceforth. 
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We can derive an expression for the ARIMA(213) model. Using the autoregressive 


and moving average operators from Equation (13) and letting p=2,d=1,q=3, we have: 


(l-o,B-©,B \i-B)z, = ©, +(\-0,B-0,8? -©,B°), (14) 
l-B-©,B+@,B? -©,B? +@,B° 7, =, +(|-©,B-0©,B? -©,B*h, (15) 
{- (1+, )B-(©, - ©, )B? -(-@,)B° v, = ©, +(1-0,B-0, 8? -©,B° hy, (16) 


Z, -(1+®,)Z,,-@, - ©, )Z,2 -¢®,)Z,, = ©, +a, -©)4,, —O24;_, ~©34,_3 (17) 


Z,= (1+, )Z,., ~(®, -®, ee —(®, )Z,_, +O, +a; -©,a,_, -924,_, —O34,_3 (18) 
Which is of the form 
2, =9,2,.) —9,2, 2 ~9£,3 + Oo +a, — O14, | —O24,_. —O34,_, (19) 


Substituting the ARIMA process coefficients found earlier, we obtain the mathematical 


expression for the RTC Great Lakes graduate data: 


Z, =2.1141-Z,_, —1.9580-Z,_, +0.8439-Z, , +21.6945 +a, —1.3637-a,, (20) 
+1.0526-a,_, —0.7808-a, , 


We can now utilize Equation (20) in spreadsheet form to obtain a tabular representation 


- of the expression, as well as its graphical interpretation: 











RTC GREAT LAKES GRADUATE DATA, ARIMA 213 PROCESS CALCULATION 


[|] | @®)Z | TrendLine | Noise | Predicted _ 
Period| Month [Observation] Z-Zr | Estimation | Zer-Z* | Value 
pc ~ | am fp | z | «= | Z& 
[a oaea| zest Ct 
Nov-94| 1742 | 1209 ~| ~—2693—~=«(Y~=SCtCST Sd 
Dec-94| 2923. | isl] 2me | iss Si 
ap san-95| 2717 | 208 =| ee d| SSCL 
sf Feb-95] 2783 | ~~ *|~Si800~d| SST SC«dCCSOT 
6] Mar-95[ 2585 | -198__| 2836 
365 | 2872 
——s[May-95| 2336 ~4| ste | 2907) SS S«d| SC 
9 fun-95] 3375 | ~—1039”~=«;~=C«98 432 
3962 
Ti Augss[ 4050] 88 «(| 301s =| =Sidos ~s|SS 
“1625 | 3086 359 
3122 345 
7493158 
—Ti6[ Jan-96[ 3497 | 79"~=«Y~C*C« 9 =| ~*~ 
3228 

18) Mar-96|_2724_| _-504__| 3268 
3623 220 
2266 
2806 | 1008 ~| ~-3767_~~«|~SC06—SC*d;C*i SN 
2174 
[—40[ Tan-98| 4097 | __355 4053 a 
Mar-98 
Apr-98 
May-98| 2932_| 934 
jun-98, 5042) 2110 | —4232_~—~«|~SCi «CCB 
Jul-98 
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Actual vs. ARIMA 213 Process with Constant 
| 





9000 


8000 


7000 


MAGNITUDE 





Oct-94 Feb-95 Jun-95 Oct-95 Feb-96 Jun-96 Oct-96 Feb-97 Jun-97 Oct-97 Feb-98 Jun-98 


8. Forecasting 
_ We perform a slight extension to Equation (20) in order to establish a forecast of 
future RTC Great Lakes graduation rates based on the ARIMA 213 process. To forecast 


a value for the variable Z , / values from origin ¢, we compute in spreadsheet form: 


Z,., =2.1141:Z,,,, —1.9580: Z,,,_, + 0.8439-Z,,,,+21.6945+4,,, -1.3637-a,,,, (21) 
1 + 1.0526 -4,,..~ 0.78084... 


Alternatively, we can allow the MINITABO© software package to generate the desired 


results: 
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A graph of actual and expected values, to include the forecast figures, appears as 


follows: 


RTC Great Lakes Observed vs Expected Values 
ARIMA (213) Process . 
OCT94 - JAN99 


[—e— Observed —a— Expected | 


: 


MAGNITUDE 
: 


: 





Oct-94 Feb-95 Jun-95 Oct-95  Feb-96 Jun-96 Oct-96 Feb-97 Jun-97 Oct-97 Feb98 Jun-98 Oct-98 


36 














9. Forecasting Error 





As with the multiplicative decomposition method, we back forecasted the results by 
erforming a similar analysis with only the first forty observed values. As expected, 
different autoregressive and moving average coefficients were generated. A forecast was 
run with these particular values through period forty-six, and that forecast is contrasted to 


the actual observed values of periods forty to forty-six of the original observed values: 


ARIMA 213 Process Back Forecasting Results 


SOOoS 
a 
* 


te 


MAGNITUDE 





By visual inspection we observe that back’ forecasting provides an indication that the 
ARIMA(213) process is appropriate. The back forecasting results appear to model the 


actual observations. We soon consider the quantifiable error inherent in our model by 


way of the MSE calculation shortly. 








The residual graph for the ARIMA 213 process which follows graphically indicates 
that all trend components of the data have been eliminated, leaving only a random 
component present [Bowerman and O’Connell, 1979]. No identifiable pattern could be 


found. 


ARIMA 213 Process Residual Plot 


MAGNITUDE 





As before, we use the mean squared error of the forecasts in order to quantify the 
model’s performance. Shown is the MSE for the forecasts of periods forty-one through 


forty-six, the back forecasted data: 
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("ARIMA 713 PROCESS MSE CALCULATION] 
[|] Observed | Forecast | _——~«| ‘Squared | 
‘Period| Month | Value | Value | Error | Error |. MSE 
a eS “SY y 2 
aal Mar-98/ 2279 | 1573 | 706 —~(| ~~ 498,436.00) 
—a3|_Apr-58] 1998 | 1389 | _609___~| 370,881.00) Ss 
(44) May-98/ 2932 | 2922) ~~+«10~~~*«|~—S—=i0.00| SSCS 
rT 45[ Jun-98/ 5042 | 5930 | —-888_—~|~—788,544.00| 
46] jul.) __677_~«| ~_789_-| 1612) ~—_-2,598,544.00| 
et a ch paca oat men 
eS A A (ES (ST 
Sm 4858805 00) 
he Ae AAT PR DC: NTN SRNR 





We shall compare the ARIMA(213) process’ MSE with the multiplicative 
decomposition method’s MSE in the following section. For now we can assume by way 
of the back forecast and residual plot that the ARIMA(213) process is in itself a pertinent 


model which can be used in forecasting RTC Great Lakes’ future graduation rates. 





40 














IV. DISCUSSION AND RECOMMENDATIONS 


A. Discussion 

We see from the back forecasting results that both the multiplicative decomposition 
method and ARIMA(213) process yield adequate results. We observe that the 
multiplicative decomposition method is the more conservative of the two. It 
underestimates the RTC Great Lakes recruit forecast, as opposed to the ARIMA(213) 
process’ frequent overestimation. The multiplicative ecacdiie method’s 
conservative numbers are also evidenced by its lower mean squared error figure. Lower 
MSE figures correspond to a better fit model [Bowerman and O’Connell, 1979]. For this 
reason one can conclude that the multiplicative decomposition method produces more 
accurate results. The multiplicative decomposition method’s other allure is in its 
simplicity. Unlike the much more complicated ARIMA(213) process, the multiplicative 
decomposition method is built upon ratios and easily performed without weet software 
or advanced mathematical knowledge. 

On the other hand, one should not disregard the ARIMA(213) process’ results 
altogether. For eae of this study, the ARIMA(213) process has suffered from a low 
number of raw data observations. Ideally, the number of observed values should be at 
least approximately 50 [Box and Jenkins, 1970]. As the number of observations grow, 
the ARIMA(213) process should yield increasingly accurate results [Box and Jenkins, 


1970]. 


4) 





B. Recommendations 

It is recommended that the multiplicative decomposition and ARIMA analyses 
should be periodically performed and updated. As the number of observations under - 
study increases, the parameters for both studies will surely change and ultimately lead to 
better models [Box and Jenkins, 1970]. In the short term, the multiplicative 
decomposition method should be employed. As more data becomes available however, 


the ARIMA process should be reevaluated and considered. 


C. Concluding Comments 
The use of forecasting techniques can provide information to help alleviate many of 
the logistical problems at RTC Great Lakes and for the Navy. Knowledge of future 
months’ recruit graduation rates can ease many of the effects of RTC’s “summer surge.” - 
These unbalanced loads can be anticipated and prepared for not only by RTC, but also 
by follow-on schools, apprentice training, and manpower placement for the fleet. 
The results seen here can be splied to a “feedback mechanism” which would be 


able to temper fluctuations and approximate the “level load” scenario, or constant output 


[Box and Jenkins, 1970]. This feedback mechanism is suggested for further study. 
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